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What is claimed is: 

1 1. A method comprising: 

2 receiving a first program unit in a parallel 

3 computing environment, the first program unit including a 

4 reduction operation associated with a set of variables; 

5 translating the first program unit into a second 

6 program unit, the second program unit to associate the 

7 reduction operation with a set of one or more instructions 

8 operative to partition the reduction operation between a 

9 plurality of threads including at least two threads; and 
translating the first program unit into a third 

11 program unit, the third program unit to associate the- 

12 reduction operation with a set of one or more instructions 

13 operative to perform an algebraic operation on the 

14 variables. 

1 2 . The method of claim 1 further comprising 

2 encapsulating the reduction operation with the instructions 

3 associated with the third program unit. 

1 3. The method of claim 1 further comprising reducing 

2 the variables logarithmically. 

1 4. The method of claim 1 further comprising 

2 translating the first program unit into the second program 

3 unit utilizing, in part, a source-code to source-code 

4 translator. 
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1 5. The method of claim 1 further comprising 

2 translating the first program unit into the third program 

3 unit utilizing, in part, a source-code to source-code 

4 translator. 

1 6. The method of claim 1 further comprising 

2 associating the plurality of threads each with a unique 

3 portion of the set of variables. 

1 7. The method of claim 6 further comprising 

2 combining, in part, the variables associated with the 

3 plurality of threads in a pair-wise reduction operation. 

1 8. An apparatus comprising: 

2 a memory including a shared memory location; 

3 a translation unit coupled with the memory, the 

4 translation unit to translate a first program unit 

5 including a reduction operation associated with a set of at 

6 least two variables into a second program unit, the second 

7 program unit to associate the reduction operation with one 

8 or more instructions operative to partition the reduction 

9 operation between a plurality of threads including at least 

10 two threads; 

11 a compiler unit coupled with the translation unit 

12 and the shared-memory, the compiler unit to compile the 

13 second program unit; and 



20 



< 



ri 



14 a linker unit coupled with the compiler unit and 

15 the shared-memory, the linker unit to link the compiled 

16 second program with a library. 

1 9 . The apparatus of claim 8 wherein the second 

2 program unit associates a set of one or more instructions 

3 with the reduction operative to encapsulate the reduction 

4 operation. 

1 10. The apparatus of claim 8 wherein the variables in 

2 the set of variables are each uniquely associated with the 

3 plurality of threads and the library includes instructions 

4 operative to combine, in part, the variables associated 

5 with the plurality of threads. 



a 1 11. The apparatus of claim 10 wherein the library 

ri 

kI 2 includes instructions operative to combine, in part, the 

y 3 variables in a pair-wise reduction. 



1 12 . The apparatus of claim 8 further comprising a set 

2 of one or more processors to host the plurality of threads, 

3 the plurality of threads to execute instructions associated 

4 with the second program unit. 

1 13. The apparatus of claim 8 wherein the second 

2 program includes a callback routine and the callback 

3 routine is associated with instructions operative to 
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4 perform an algebraic operation on at least two variables in 

5 the set of variables. 

1 14. The apparatus of claim 13 wherein the library is 

2 operative to call the callback routine to perform, in part, 

3 a reduction on at least two variables in the set of 

4 variables. 

1 15. A machine-readable medium that provides 

2 instructions, that when executed by a set of one or more 

3 processors, enable the set of processors to perform 

4 operations comprising: 

5 receiving a first program unit in a parallel 

6 computing environment, the first program unit including a 

7 reduction operation associated with a set of variables; 

8 translating the first program unit into a second 

9 program unit, the second program unit to associate the 

10 reduction operation with a set of one or more instructions 

11 operative to partition the reduction operation between a 

12 plurality of threads including at least two threads; and 

13 translating the first program unit into a third 

14 program unit, the third program unit to associate the 

15 reduction operation with a set of one or more instructions 

16 operative to perform an algebraic operation on the 

17 variables . 
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1 16. The machine -readable medium of claim 15 further 

2 comprising encapsulating the reduction operation with a set 

3 of one or more instructions. 

1 17. The machine -readable medium of claim 15 further 

2 comprising translating the first program unit into the 

3 second program unit utilizing, in part, a source-code to 

4 source -code translator. 

1 18. The machine -readable medium of claim 15 further 

2 comprising reducing the variables, in part, 

3 logarithmically. 

1 19. The machine -readable medium of claim 15 further 

2 comprising translating the first program unit into the 

3 third program unit utilizing, in part, a source-code to 

4 source-code translator. 

1 20. The machine-readable medium of claim 15 further 

2 comprising the second program unit utilizing, in part, the 

3 third program unit to perform a reduction operation on the 

4 set of variables. 
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